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Abstract 

We  develop  a  new  class  of  overlapping  Schwarz  type  algorithms  for  solving  scalar  convection- 
diffusion  equations  discretized  by  finite  element  or  finite  difference  methods.  The  precon¬ 
ditioners  consist  of  two  components,  namely,  the  usual  two-level  additive  Schwarz  precon¬ 
ditioner  and  the  sum  of  some  quadratic  terms  constructed  by  using  products  of  ordered 
neighboring  subdomain  preconditioners.  The  ordering  of  the  subdomain  preconditioners  is 
determined  by  considering  the  direction  of  the  flow.  We  prove  that  the  algorithms  are  opti¬ 
mal  in  the  sense  that  the  convergence  rates  are  independent  of  the  mesh  size,  as  well  as  the 
number  of  subdomains.  We  show  by  numerical  examples  that  the  new  algorithms  are  less 
sensitive  to  the  direction  of  the  flow  than  either  the  classical  multiplicative  Schwarz  algo¬ 
rithms,  and  converge  faster  than  the  additive  Schwarz  algorithms.  Thus,  the  new  algorithms 
are  more  suitable  for  fluid  flow  applications  than  the  classical  additive  or  multiplicative 
Schwarz  algorithms. 
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1.  Introduction.  In  this  paper,  we  present  some  new  overlapping  domain  decompo¬ 
sition  methods  for  the  numerical  solution  of  large,  sparse,  non  symmetric,  and/or  indefinite 
linear  systems  of  equations  arising  from  Galerkin  finite  element  discretizations  of  elliptic  par¬ 
tial  differential  equations.  The  new  algorithms  belong  to  the  family  of  overlapping  Schwarz 
methods  which  is  a  variant  of  the  classical  Schwarz  alternating  algorithm,  introduced  in  1870 
by  H.  A.  Schwarz  [22]  in  an  existence  proof  of  elliptic  boundary  value  problems  defined  in 
certain  irregular  regions.  This  family  of  methods  has  attracted  much  attention  in  the  past 
few  years  as  convenient  and  powerful  in  the  solution  of  partial  differential  equations,  see, 
especially  on  parallel  machines.  For  tutorial  presentations,  see,  e.g.  [7,23]. 

We  shall  focus  on  linear  nonsymmetric  and/or  indefinite  second-order  elliptic  finite  ele¬ 
ment  or  finite  difference  equations.  The  solution  of  such  problems  is  an  important  compu¬ 
tational  kernel  in  implicit  methods,  such  as  solving  systems  involving  the  Jacobian  in  any 
Newton-like  method  used  in  computational  fluid  dynamics,  [3,  4,  24].  This  family  of  methods 
is  built  upon  the  so-called  subdomain  mapping  operators  T,-,  which  solve  the  original  prob¬ 
lem,  defined  on  a  domain  12,  approximately  in  subdomains  f2[  C  f2  with  artificial  boundary 
conditions  and  zero  extensions  to  f2  —  f2[.  The  formal  definitions  of  Tt  and  will  be  given  in 
the  next  section.  By  using  these  T? s  as  basic  building  blocks,  a  family  of  polynomial  Schwarz 
algorithms  can  be  defined.  Let  N  be  the  number  of  subdomains  and  T0  the  coarse  space 
mapping  operator.  We  define 

T  =  poly  (To,  T\,  •  •  • ,  Tn) 

as  a  multi-dimensional  matrix-valued  polynomial  with  variables  Tt-,  and  assume  that  the 
polynomial  satisfies  poly( 0,  •  •  •  ,0)  =0,  which  simply  means  that  the  constant  term  in  the 
polynomial  is  zero.  It  is  known  that  if  u*  is  the  exact  solution  of  the  finite  element  equations 
then  Tu*  can  be  computed  without  knowing  u*  itself.  This  is  because  Ttu*,  i  =  0,  •  •  • ,  TV, 
can  be  computed  directly  from  the  right-hand  side  function  of  the  finite  element  equations. 
With  g  =  Tu*  as  the  new  right-hand  side  vector,  a  new  linear  system  can  be  introduced  as 

Tu  =  g 

and  it  is  not  difficult  to  show  that  if  T  is  nonsingular  then  the  new  linear  system  gives 
the  desired  finite  element  solution  u*.  For  each  choice  of  the  polynomial  poly ,  a  particular 
Schwarz  algorithm  is  defined.  The  algorithm  is  called  optimal  if  the  condition  number,  or 
some  other  “equivalent  measure”  for  nonsymmetric  or  indefinite  problems,  of  the  operator 
T  is  independent  of  the  mesh  parameter  h  and  the  number  of  subdomains  N.  Several  such 
optimal  algorithms,  such  as  the  additive  ( T  =  and  multiplicative  ( T  =  I-UfL0(I- 

T{ ))  Schwarz  algorithms,  have  been  identified.  Generally,  the  additive  algorithms  have  two 
features  among  others: 

•  They  converge  more  slowly  than  the  multiplicative  algorithms  because  of  the  lack 
of  subdomain-to-subdomain  communications  within  each  iteration; 

•  Their  convergence  is  independent  of  the  ordering  and  coloring  of  the  subdomains. 
The  features  of  the  multiplicative  algorithms  include: 

•  They  are  faster  in  terms  of  the  total  iteration  number; 
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•  They  are  not  as  parallel  as  the  additive  algorithms  because  of  the  data  dependence 
between  overlapping  subdomains 

•  They  have  a  strong  dependence  on  the  global  ordering  and  coloring  of  subdomains 
especially  for  convection-diffusion  problems. 

See  the  last  section  of  this  paper  for  a  detailed  discussion  on  the  ordering  and  coloring 
issues.  To  use  the  multiplicative  algorithms  efficiently,  it  is  important  to  color  and  order 
the  subdomains  correctly.  However,  to  obtain  the  optimal  coloring  and  ordering  is  difficult 
in  practice  especially  when  the  underlying  mesh  is  unstructured  and  the  subdomains  are 
obtained  by  means  of  graph  partitioning,  see,  e.g.,  [3,  10].  For  a  particular  problem  and  a 
given  subdomain  partitioning,  it  is  not  impossible  to  obtain  a  reasonable  subdomain  coloring 
and  ordering  according  to  certain  practical  heuristics,  but,  in  general,  especially  for  unsteady 
problems  where  the  flow  direction  changes  from  time  step  to  time  step,  it  becomes  desirable 
to  have  algorithms  that  do  not  need,  or  depend  less  on,  manual  subdomain  ordering  and 
coloring.  Extensive  discussions  on  the  effects  of  ordering  and  coloring  of  nodes  or  elements, 
in  the  context  of  iterative  and  direct  sparse  matrix  computations,  can  be  found  in  many 
research  papers,  see,  e.g.,  [1,  9,  15,  17].  Some  of  the  ideas  and  techniques  can  also  be 
applied,  with  certain  modifications,  to  the  coloring  and  ordering  of  overlapping  subdomains. 
We  will  not  consider  these  techniques  in  this  paper,  since  our  interest  is  in  automating  the 
construction  of  the  preconditioner. 

In  this  paper,  we  shall  identify  some  overlapping  Schwarz  algorithms,  which  we  call  the 
local  multiplicative  Schwarz  methods.  The  new  algorithms  are  not  only  optimal  but  also 
have  convergence  rates  that  are: 

•  better  than  that  of  the  additive  Schwarz  method; 

•  not  sensitive  to  the  coloring  and  global  ordering  of  the  subdomains,  nor  the  flow 
direction; 

•  more  parallel  than  the  multiplicative  Schwarz  algorithm. 

Our  basic  idea  is  to  use  the  multiplicative  Schwarz  algorithm  only  locally  between  those 
pairs  of  overlapping  subdomains  for  which  we  have  effective  techniques  to  determine  the 
flow  direction  without  any  global  operations.  We  use  additive  techniques  to  handle  the 
global  communication  between  pairs  of  subdomains  and  the  coarse  level  preconditioning. 

The  paper  is  organized  as  follows.  In  Section  2,  we  define  our  model  elliptic  problem,  its 
discretization  and  the  overlapping  partitioning  of  the  finite  element  mesh.  In  Section  3,  we 
introduce  and  analyze  the  new  local  multiplicative  Schwarz  algorithms  for  symmetric  and 
general  nonsymmetric  problems.  In  the  last  section  of  the  paper  we  provide  some  numerical 
examples  regarding  the  performance  of  the  new  algorithms,  as  well  as  some  comparisons 
with  the  classical  additive  and  multiplicative  Schwarz  algorithms. 

2.  Model  problems  and  subdomain  partitioning.  Let  0  be  an  open,  bounded 
polygonal  region  in  Rd,d  =  2  or  3,  with  boundary  50.  We  consider  the  homogeneous 
Dirichlet  boundary  value  problem 


=  f{x)  in 
=  0 


(1) 


Lu{x) 

u(x) 


2 


on 


0, 

50. 


Here  the  elliptic  operator  L  has  the  form  Lu(x )  =  —V  •  (Vu)  +  2/?(a;)  •  Vu  +  c(x)u.  All  the 
coefficients  are,  by  assumption,  sufficiently  smooth  and  the  right-hand  side  /  €  L2(£l).  We 
assume  that  the  equation  has  a  unique  solution  in  Hq(Q).  Let  (•,  •)  denote  the  usual  L2(Q) 
inner  product  and  ||  ■  ||  or  ||  •  \\i2  the  corresponding  norm.  The  weak  form  of  equation  (1)  is: 
Find  u  €  Hq(Q.)  such  that 

(2)  b(u,v)  =  (f,v),  Vu<E#o(^)- 

The  bilinear  form  b(u,  v )  is  defined  by 

b(u,v)  =  f  Vw  -Wvdx  +  f  (0-'Vu)vdx+  (  V  •  (/3u)vdx  +  [  (cuv)dx. 

J  Si  J  O  J  Si  J  Q 

Here  c(x)  =  c(x)  —  V  •  /?.  In  addition  to  the  following  bilinear  form 

a(u,v )  =  /  Vu-Vvdx , 

J 

which  is  used  as  the  usual  energy  inner  product  in  Hq(Q.)  with  norm  defined  by  ||u||a  = 
(a(«,  u))1/2,  we  also  use  two  other  bilinear  forms 

s(u,u)  =  f  (f3-'Vu)vdx+  f  V  •  (fiu)vdx 
J  Si  J  Si 

and  c(u.  v )  =  (cu,  v),  which  correspond  to  the  skew-symmetric  and  zeroth  order  parts  of  L, 
respectively.  It  is  easy  to  verify  that 

s(u,v )  =  — s(u,u),  Vu,  v  €  Ho(fl). 

Following  Dryja  and  Widlund  [13],  we  define  a  two-level  conforming  finite  element  trian¬ 
gulation  of  0.  The  region  is  first  divided  into  nonoverlapping  subdomains  Cl{,i  =  1,  •  •  • ,  N, 
such  that  Q  =  fi;.  Then  all  the  subdomains  fl;,  which  are  assumed  to  have  diameter 
of  order  H,  are  divided  into  triangular  elements  of  size  h.  We  assume  that  the  union  of  all 
of  the  elements  of  size  h ,  forms  a  regular  finite  element  triangulation  of  ft.  The  common 
assumption,  in  finite  element  theory  (cf.  [8]),  that  all  elements  are  shape  regular  is  adopted. 
With  such  a  triangulation,  we  let  Vh  C  Hq($1)  be  the  usual  piecewise  linear  continuous  finite 
element  space  on  f 2.  To  obtain  an  overlapping  decomposition  of  the  domain,  we  extend 
each  subdomain  Hi  to  a  larger  region  i.e.  0;  C  C  0.  We  assume  that  the  over¬ 
lap  is  uniformly  large  and  let  V{  =  14  fl#o(^)  C  14  be  the  usual  finite  element  subspace 
defined  over  with  zero  extension  to  Cl  —  Here  uniformly  large  overlap  means  that 
distance(5Q[  fl  0,  dttt  fl  Cl)  >  cH,  where  c  >  0  is  a  constant  independent  of  H.  It  is  clear 
that  0  =  U  Oj  and  14  =  Vo  T  14  T  ■  •  •  4-  Vn • 

The  finite  element  discretization  of  (2)  reads  as  follows:  Find  u*  €  14  such  that 

(3)  b(u*,v)  =  (f,v),  WveVh. 

Another  key  ingredient  in  the  design  of  optimal  domain  decomposition  preconditioners 
is  the  use  of  at  least  one  global  coarse  space,  which  in  a  way  connects  the  local  subdomains 


just  introduced.  A  number  of  coarse  spaces  have  been  introduced  in  the  literature,  see,  e.g., 
[11, 12].  We  shall  focus  only  on  a  simple  one.  Let  0#  =  {t,}  be  a  quasi-uniform  triangulation 
of  D  and  rt  one  of  the  triangles  with  a  diameter  on  the  order  of  H.  $Ih  is  the  coarse  grid. 
Let  Vo  be  the  piecewise  linear  continuous  finite  element  space  on  0# .  In  the  analysis  part 
of  this  paper  we  assume,  for  simplicity,  that  ho  C  14,  and  that  the  diameter  of  the  coarse 
elements  r,-  is  of  the  same  order  as  the  diameter  of  the  subdomains  0,-.  The  theory  can  easily 
be  extended  to  the  case  of  a  non-nested  coarse  [2],  and  to  cases  with  small  overlap  [14], 

In  the  numerical  experiments  section,  we  shall  present  some  cases  where  the  sizes  of  the 
subdomains  and  the  coarse  elements  are  of  different  order. 

For  each  i  =  0, 1, . . . ,  N,  we  define  a  mapping  operator  T,  :  Vh  -*■  V  by 

(4)  b(TiU,v)  =  b(u,v ),  Vu  €  14,  Vv  €  V  . 

These  T;  will  serve  as  the  basic  building  blocks  of  the  algorithms  to  be  discussed  in  the 
next  sections.  We  shall  mention  that  these  TVs  can  also  be  defined  inexactly  if  we  replace 
the  left-hand  side  bilinear  form  in  (4)  by  a  different  bilinear  form,  which,  in  some  sense  is 
equivalent  to  &(•,•).  Details  on  inexact  Schwarz  algorithms  can  be  found  in,  for  example, 

[5,  7,  23]. 

3.  New  algorithms  and  analysis.  In  this  section,  we  define  the  local  multiplicative 
Schwarz  algorithms  by  using  the  basic  Schwarz  building  blocks  Tt  defined  in  the  previous 
section.  For  each  pair  of  neighboring  subdomains,  with  indices  i  and  j ,  we  define  a  multi¬ 
plicative  Schwarz  operator 

Pij  =  /-(/-  Tj)(I  -  Ti). 

Note  that  for  any  u  €  14,  PjU  eVt  +  V3,  and  generally  Pt]  /  Pji,  unless  Dt-  and  %  have  no 
common  points.  Let 

(5)  P  =  T0  +  ££•;, 

where  the  summation  is  taken  over  all  possible  P;/ s.  Let  gi3  =  PijU*  and  g0  =  T0«*;  as 
mentioned  earlier,  both  can  be  computed  without  the  knowledge  of  u* .  With  g  =  g0  +  £  9ij, 
it  can  be  seen  that  if  the  operator  P  is  nonsingular,  then  the  linear  system 

(6)  Pu  =  9 

has  the  same  solution  as  that  of  (3).  We  shall  prove  in  the  remainder  of  the  paper  that  P 
is  indeed  nonsingular  and  uniformly  well-conditioned,  and  that  therefore  (6)  can  be  solved 
by  using  certain  Krylov  space  type  iterative  acceleration  methods,  such  as  CG  or  GMRES 
[21].  We  remark  that  if  the  bilinear  form  6(v)  is  symmetric,  then  the  operator  P  is  also 
symmetric  with  respect  to  b( •,  •).  In  other  words,  the  local  multiplicative  Schwarz  operator  P 
is  symmetric  if  both  P2J  and  Pji  are  included  in  its  definition.  Later,  in  this  section,  we  shall 
intentionally  destroy  the  symmetry  by  dropping  one  of  the  two  terms  when  solving  nonsym- 
metric  problems.  Keeping  only  the  terms  in  the  upwind  direction  makes  the  algorithm  very 
useful  for  convection- diffusion  equations.  Like  other  upwinding  type  discretization  schemes, 
we  shall  also  introduce  a  parameter  n  that  controls  the  amount  of  the  upwinding,  or  artificial 

diffusion,  in  the  Schwarz  preconditioning  polynomial. 
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3.1.  Analysis  for  the  symmetric  positive  definite  case.  Since  the  symmetric  posi¬ 
tive  definite  case  is  rather  simple,  we  consider  it  here.  Throughout  this  subsection  we  assume 
that  &(•,•)  =  a(-,-).  The  full  abstract  theory  of  Dryja  and  Widlund,  [13],  on  the  optimal 
convergence  of  the  additive  Schwarz  methods  cannot  be  used  directly  because  our  subprob¬ 
lem  operators  P,j  are  not  defined  as  projections.  We  summarize  the  results  of  the  symmetric 
case  in  the  following  theorem. 

THEOREM  1.  There  exist  positive  constants  c  and  C ,  independent  of  the  the  mesh 
parameters  h  and  H ,  such  that 

c|MI a  <  a(Pu,u)  <  C\\u\\l, 

for  any  u  6  14 

One  of  the  key  facts  that  we  shall  use  in  the  proof  of  the  theorem  is  given  in  the  following 
lemma  due  to  Dryja  and  Widlund  [13]. 

Lemma  3.1  (Dryja  and  Widlund[13]).  There  exist  positive  constants  c  and  C, 
independent  of  the  mesh  parameters,  such  that 

c||«€<£M<c|[<* 

i= 0 

for  any  u  (E  14 . 

We  remark  again  that  since  both  Pij  and  Pji  are  included  in  the  definition  of  P,  the 
operator  P  is  symmetric  with  respect  to  &(•,•)•  The  upper  bound  of  the  operator  P  can  be 
obtained  easily,  since 

II <  cm«\\i  +  \\TM\l) 

and 

\\PA\<cY,\\r<iAl 

By  using  Lemma  3.1,  we  obtain  that  ||Pu||a  <  CIMIa,  for  any  u  G  Vh  and  where  C  >  0  is  a 
constant  independent  of  the  mesh  parameters.  To  obtain  the  lower  bound,  we  note  that 

(7)  a(PijU,  u )  =  IIUttH*  +  \\Tju\\2a  -  a(TiU ,  7». 

Using  the  fact  that  a(TiU,Tju)  <  ||Ttu||a||Tju||a  <  l/2(||Ttu||^  +  ||Tju||^),  we  have 

a(P,i%u)>  i  (IM  +  im;)  . 

This  gives  the  lower  bound  when  combined  with  Lemma  3.1. 

We  remark  that  since  the  operator  P  is  symmetric  and  positive  definite  with  respect  to 
the  inner  product  a(*,  •),  the  conjugate  gradient  method  can  be  used.  It  is  obvious  that  the 
degree  of  parallelism  of  the  new  method  is  higher  than  that  of  the  symmetrized  multiplicative 
Schwarz  algorithms.  Here  we  have  considered  only  Poisson’s  equation;  the  extension  of 
the  algorithm  and  theory  to  general  variable  coefficient  symmetric  positive  definite  cases  is 
straightforward . 

The  analysis  for  the  symmetric  case  is  included  above  for  theoretical  interest.  In  practice, 
however,  we  do  not  believe  that  this  type  of  upwinding  preconditioning  would  offer  much 
improvement  over  the  classical  additive  Schwarz  method  for  symmetric  positive  definite 
problems.  Some  numerical  examples  are  included  in  the  last  section  of  the  paper. 
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3.2.  Analysis  for  the  general  nonsymmetric  case.  We  consider  the  general  non- 
symmetric  case  in  this  subsection.  The  techniques  are  mainly  borrowed  from  Cai  and  Wid- 
lund  [5,  6].  Let  us  begin  by  summarizing  the  main  results,  namely  that  the  operator  P 
is  uniformly  bounded  and  its  symmetric  part,  with  respect  to  the  inner  product  o(  ,  )?  is 
uniformly  positive  definite,  in  the  following  theorem.  This  theorem  provides  the  optimal 
convergence  of  several  Krylov  space  iterative  methods,  including  GCR  [16]  and  GMRES  [21] 
among  others. 

THEOREM  2.  There  exist  positive  constants  Ho,  c(i/o)  an d  C >  independent  of  the  mesh 
parameters  h  and  H,  such  that  if  H  <  Ho,  the  operator  P  is  uniformly  bounded,  i.e., 

\\Pu\\a  <  C\\u\\a,  Vu  €  14, 

and  its  symmetric  part  is  uniformly  positive  definite,  i.e., 

a(Pu,u)  >  c\\u\\2a,  Vu  €  14- 


To  prove  the  above  theorem,  we  need  a  result  from  Cai  and  Widlund  [6]  regarding  the 
optimality  of  the  additive  Schwarz  preconditioner. 

Lemma  3.2  (Cai  and  Widlund[6]).  There  exist  positive  constants  Ho,  c(H0 )  and  C, 

independent  of  the  mesh  parameters,  such  that  if  H  <  H0 

HErt«||0<C|H|o,  VueVh, 

i= 0 


and 

E II TAl  >  c||uE,  v«  €  u. 

i- 0 


We  next  present  a  number  of  useful  lemmas  before  giving  the  proof  of  the  main  theorem 
later  in  this  subsection.  The  following  lemma  says  that  the  symmetric  part  of  T,  is  positive 
definite  if  the  size  of  the  subdomains,  i.e.  H,  is  sufficiently  small.  The  proof  is  relatively 
simple,  and  therefore  not  included.  The  constant  C  appearing  in  the  lemma  depends  on  the 
coefficients  /3(x)  and  c(x)  of  the  elliptic  operator  L. 

LEMMA  3.3.  There  exists  a  positive  constant  C,  independent  of  the  mesh  parameters, 

such  that 

a(u,Tiu)  >  (1  -  CfOIITiuli;  -  Cff|Mi;(a;), 

for  any  i ,  and  u  € 

The  contribution  from  the  first  and  zeroth  order  terms  of  the  elliptic  operator  L  is 
estimated  in  the  next  lemma.  We  prove  that  the  contribution  is  of  lower  order  in  H. 

LEMMA  3.4.  There  exists  a  positive  constant  C,  independent  of  the  mesh  parameters  h 
and  H,  such  that  for  any  i,j  ±  0  for  which  ft]  and  ft'-  overlap 
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1.  siTiU^^CHmuWi  +  WTjuWl) 

2.  s(T,T,u,T,u)  <  G'//(||I'u||*+ 

S.  s(u,  TiTju)  <  CH  (||2>E  +  ll“C(o'l) 

for  all  u  €  14.  The  same  estimates  hold  if  the  bilinear  form  s(v)  is  replaced  by  the  bilinear 
form  c(-,  •). 

We  leave  the  proof  of  this  lemma  to  the  interested  reader.  The  basic  idea  of  the  proof  is  to 
use  that  IITHI^n')  <  CH\\Tiu\\a(^y  for  any  /  ^  0.  As  in  the  previous  lemma,  the  constants 
C  depend  on  the  coefficients  /3(x)  and  c(x)  of  the  elliptic  operator  L.  Using  Lemmas  3.3 
and  3.4,  we  now  proceed  to  give  a  lower  bound  of  the  two-subdomain  multiplicative  Schwarz 
operator  Pij. 

LEMMA  3.5.  There  exists  a  positive  constant  C ,  independent  of  the  mesh  parameters  h 
and  H,  such  that  for  any  i,j  for  which  fl-  and  Slj  overlap 

o(Ptf«,u)  >  (i  -  CH )  ||I}«||J  +  (i  -  CH)  p>|p.  -  CH\\u\\lfa,y 

for  any  u  6  14  • 

Proof.  We  first  note,  by  using  the  definition  of  the  operators  T;  and  Tj  and  the  fact  that 
Hv)  =  o(v)  +  s(v)  +  c(v)>  that 

a(PijU ,  u )  =  a(Tiu ,  u )  +  a(Tju,  u )  -  a(T;TjU,  u ) 

=  a(Tt-u,  u )  +  a(TjU ,  u )  —  a(T,u,  TjU )  + 

s(TiU,  TjU )  -  c(TiU,  TjU )  +  2 s(TiTjU, Txu)  + 


s{u, TiTju)  +  c(u, TiTju). 

The  desired  proof  follows  by  using  Lemmas  3.3  and  3.4.  □ 

We  are  now  ready  to  prove  the  main  theorem  of  this  subsection.  The  upper  bound  is 
easy.  It  can  be  seen  that 

Pij  =  Ti  +  (I  —  Ti)Tj. 


By  using  the  fact  that  (/  -  Tt)  is  uniformly  bounded,  we  obtain 

\\PijU\\a  <  C(\\TiU\\a  +  ||7>||a). 

The  upper  bound  of  P  can  then  be  obtained  by  summing  the  above  estimate  for  all  possible 
pairs  of  subdomains  and  using  Lemma  3.2.  To  establish  the  lower  bound,  we  sum  the 
estimate  in  Lemma  3.5  and  use  the  lower  bound  part  of  Lemma  3.2,  and  the  assumption 
that  H  is  sufficiently  small. 
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Fig.  1.  The  term  TiTj  is  kept  in  the  Schwarz  polynomial  only  if  the  flow  goes  from  £2;*  to  £2*. 


3.3.  A  weighted  local  multiplicative  algorithm.  In  this  subsection,  we  introduce  a 
variant  of  the  local  multiplicative  algorithm  that  is  particularly  useful  for  fluid  flow  problems. 
The  basic  philosophy  is  the  same  as  in  the  design  of  any  upwinding  type  discretization 
schemes.  We  first  note  that  the  operator  P  has  the  following,  more  explicit,  form 

(8)  P=  Y,  Ti-  E  TiTj. 

0<i<N 

In  other  words,  P  is  equal  to  the  regular  two-level  additive  Schwarz  operator  plus  some  sec¬ 
ond  order  perturbation  terms.  Since  the  additional  second  order  terms  enhance  the  nearest 
neighbor  communication,  we  therefore  believe  they  will  make  the  overall  convergence  faster 
for  the  classical  additive  Schwarz  algorithms.  This  observation  will  be  confirmed  by  a  num¬ 
ber  of  numerical  experiments  in  the  next  section.  Borrowing  a  term  from  the  Streamline 
Upwind  Petrov-Galerkin  (SUPG)  methods  [18,  20],  the  second  order  terms  TtTj,  if  used  prop¬ 
erly,  “stabilize”  the  preconditioner  when  solving  convection-diffusion  equations.  The  SUPG 
method  also  suggests  the  following  version  of  the  algorithm  with  weights  in  the  upwinding 
directions.  Let 

(9)  r=  E  r,-  E 

0<i<N  1  <ift<N 

Here  fXij  equals  zero  or  //,  where  0<//<1.0isa  constant.  The  choice  of  fXij  depends  on 
the  direction  of  the  flow.  The  intuition  is  that  if  the  flow  goes  from  ti'j  to  O]  and  if  these 
two  subdomains  are  neighbors,  then  we  set  Hij  to  be  a  positive  constant  /x,  and  fXji  to  zero. 
We  have  not  exploited  the  possibility  of  using  different  for  different  pairs  of  subdomains. 
Of  course,  if  fi-  and  O’-  are  not  neighbors  we  then  set  fi{j  =  jijt  =  0.  The  motivation  here 
is  exactly  the  same  as  in  using  the  upwinding  techniques  in  the  solution  of  problems  that 
involve  hyperbolic  components.  A  difference  is  that  the  usual  upwinding  techniques  are 
used  only  at  the  discretization  level,  and  our  “upwinding”  is  introduced  as  a  way  to  define 
the  preconditioning  polynomial.  It  is  understandable  that,  for  problems  that  have  a  strong 
characteristic  direction,  such  as  convection-diffusion  problems,  some  kind  of  upwinding  can 
speed  up  the  convergence. 

We  now  propose  a  heuristic  method  to  be  used  to  determine  the  flow  direction.  Let 
fi(x)  =  (bi(x), . . . ,  bd(x))T  be  the  characteristic  vector  of  the  flow.  For  each  pair  of  neigh¬ 
boring  subdomains,  we  choose  a  curve,  such  as  T,j  in  Fig.  1  or  another  curve  in  fh  ff|  Qj, 
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that  more  or  less  separates  the  subdomains.  Since  we  are  defining  preconditioners,  it  is  not 
necessary  to  find  the  precise  separating  curve.  Let  n,-j,  defined  on  I\,-,  be  the  unit  vector 
pointing  from  subdomain  Clj  to  0, .  We  define  the  parameters  y tj  by  looking  at  the  sign  of 
a  line  integral 


if  J  (3(x)  ■  riijds  >  0 
otherwise, 


where  the  integral  is  taken  along  the  curve  Tij. 

4.  Numerical  experiments.  In  this  section,  we  present  some  experimental  results  to 
numerically  understand  the  local  multiplicative  Schwarz  algorithms,  and  to  compare  them 
with  the  classical  additive  and  multiplicative  Schwarz  algorithms  for  both  symmetric  pos¬ 
itive  definite  and  nonsymmetric  problems.  Although  the  proposed  methods  belong  to  the 
class  of  optimal  preconditioners,  some  effort  is  needed  to  obtain  the  best  performance  for 
a  particular  test  problem,  especially  in  the  selection  of  the  parameter  y  in  both  symmetric 
and  nonsymmetric  cases.  We  note  that  y  =  1.0  is  usually  not  a  good  choice.  As  mentioned 
earlier,  our  optimal  convergence  theory  requires  that  the  coarse  grid  is  sufficiently  fine,  how¬ 
ever,  in  practice,  especially  in  the  nonsymmetric  cases,  it  is  quite  difficult  to  find  a  coarse 
grid  of  proper  size  such  that  the  convergence  is  not  slower  than  the  purely  local  (i.e.,  without 
a  coarse  space)  Schwarz  algorithms. 

We  consider  the  following  model  problem  on  the  unit  square 

f  Lu  =  f  in  0  =  (0,1)  x  (0,1), 

(  u  =  0  on  di 2. 

The  right-hand  side  /  is  always  chosen  such  that  the  exact  solution  is  u-  xexysin(‘Kx)sin{'Ky). 
The  coefficients  of  L  will  be  specified  later  for  each  test  problem.  We  use  an  256  x  256  uni¬ 
form  fine  mesh  throughout  this  section.  The  number  of  subdomains  is  64  in  all  test  cases, 
i.e.,  we  use  an  8  x  8  uniform  partitioning  of  the  domain  into  subdomains,  with  a  uniform  2 h 
overlap  between  each  neighboring  subdomains,  where  h  —  1/256.  In  our  experiments,  the 
coarse  grid  linear  system  and  all  the  subdomain  linear  systems  are  solved  exactly  by  using  a 
sparse  linear  system  solver  from  the  Argonne  National  Laboratory  software  package  PETSc 
of  Gropp  and  Smith  [19].  All  the  Schwarz  methods  are  used  as  left  preconditioners  for  the 
CG  method,  or  the  non-restarted  GMRES  method,  with  a  zero  initial  guess.  We  stop  the 
CG  or  GMRES  iteration  as  soon  as  the  preconditioned  initial  residual  is  reduced  by  a  factor 
of  10'5.  We  discretize  the  PDE  at  both  the  fine  and  the  coarse  levels  by  the  usual  five-point 
central,  or  upwinding,  finite  difference  method. 

Example  0.  We  first  test  the  algorithms  on  a  simple  Poisson’s  equation.  (This  is  not 
what  the  new  algorithm  is  designed  for.)  In  Fig.  2,  we  show  that  the  new  algorithm  is  slower 
than  the  multiplicative  Schwarz  algorithm,  but  with  parameter  y  =  0.3,  faster  than  the 
additive  Schwarz  algorithm.  Without  using  a  proper  y,  the  algorithm  can  be  slow.  An  8  x  8 
coarse  solve  is  included  in  all  cases.  The  multiplicative  Schwarz  algorithm  is  symmetrized  in 
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102 


Fig.  2.  The  curves  show  the  iteration  history  of  the  additive ,  multiplicative  and  the  local  multiplicative 
Schwarz  preconditioned  CG  methods.  The  solid  curve  represents  the  local  multiplicative  Schwarz  method, 
the  dashed  curve  represents  the  additive  Schwarz  and  the  broken  curve  represents  the  multiplicative  Schwarz 
method . 

order  to  be  able  to  use  CG.  We  remark  again  that  even  though  the  symmetrized  multiplicative 
Schwarz  is  the  fastest  among  the  three  algorithms,  it  has  the  lowest  parallelism.  The  per-step 
arithmetic  cost  of  the  new  algorithm  is  higher  due  the  repetition  of  the  subdomain  solves. 

Example  1.  We  let  Lu  =  —V  •  (Vu)  +  V  •  (fiu),  where  fi  =  (&i,  b2)  is  a  constant  vector 
with  b2  =  100.0,  or  —100.0.  We  discretize  the  PDE  with  the  usual  five-point  central  finite 
difference  method.  We  first  compare  the  new  method,  with  fi  =  0.5,  with  the  additive  and 
multiplicative  Schwarz  methods  without  coarse  space  in  the  case  fi  =  (100,100).  For  the 
multiplicative  Schwarz,  we  order  the  subdomains  by  the  natural  ordering.  No  coloring  is 
incorporated  in  the  implementation.  The  results  are  presented  in  the  left  figure  of  Fig.  3.  It 
can  be  seen  clearly  that ,  for  fi  =  (100, 100),  the  multiplicative  Schwarz  method  is  the  fastest 
of  the  three.  However,  the  situation  changes,  if  we  let  fi  —  (—100,  —100)  and  do  not  change 
the  subdomain  ordering  in  the  multiplicative  Schwarz  method.  As  shown  in  the  left  figure 
of  Fig.  4,  the  new  method  becomes  the  fastest  of  the  three.  Apparently,  the  changing  of  the 
flow  characteristics  hurts  the  convergence  of  the  multiplicative  Schwarz  algorithm,  but  the 
new  method  does  not  suffer. 

We  next  present  cases  when  coarse  spaces  are  included  in  the  preconditioners.  The 
optimal  convergence  theory  for  all  three  Schwarz  algorithms  requires  that  the  coarse  grid 
is  sufficiently  fine.  Our  numerical  experiments  suggest  that  they  in  fact  need  coarse  grid 
of  different  sizes,  i.e.,  a  sufficiently  fine  coarse  grid  for  one  Schwarz  method  may  not  be 
sufficiently  fine  for  the  others.  We  say  a  coarse  space  is  “good”  if  the  total  number  of 
iterations  is  smaller  than  without  it.  A  coarse  grid,  not  sufficiently  fine,  usually  leads  to  a 
slower  convergence  in  all  Schwarz  type  methods.  In  the  right  figure  of  Fig.  3,  we  present 
three  Schwarz  algorithms  with  three  different  coarse  grid  sizes,  namely  the  multiplicative 
Schwarz  with  an  16  x  16  coarse  grid;  the  additive  Schwarz  with  an  32  x  32  coarse  grid;  the 
new  method  with  an  64  x  64  coarse  grid,  and  fi  =  0.5.  Comparing  the  right  figures  in  Fig.  3 
and  Fig.  4,  we  observe  that  the  multiplicative  Schwarz  method  with  a  coarse  space  of  proper 
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Fig.  3.  Central  finite  difference  discretization  with  (3  —  (100,100).  The  left  figure  shows  cases  without 
coarse  spaces,  and  the  right  figure  shows  cases  with  coarse  spaces.  The  curves  show  the  iteration  history 
of  the  additive  (dashed),  multiplicative  (broken)  and  the  local  multiplicative  (solid)  Schwarz  preconditioned 
GMRES  methods. 


Fig.  4.  Central  finite  difference  discretization  with  (3  =  (-100,-100).  The  left  figure  shows  cases  without 
coarse  spaces,  and  the  right  figure  shows  cases  with  coarse  spaces.  The  curves  show  the  iteration  history  of  the 
additive  (dashed),  multiplicative  (broken)  and  the  local  multiplicative  (solid)  Schwarz  preconditioned  GMRES 
methods. 

size  is  always  the  best  of  the  three. 

Example  2.  We  let  Lu  =  -V  •  (Vu)  +  V  •  (/?«),  where  0  =  (61,62)  is  a  constant 
vector  with  61?  62  =  1000.0  or  -1000.0.  The  equation  is  discretized  by  the  usual  five-point 
upwinding  finite  difference  method.  We  run  the  test  code  without  using  coarse  spaces  for  four 
different  constant  flow  directions.  As  before,  for  the  multiplicative  Schwarz  preconditioner, 
we  order  the  subdomains  in  the  natural  ordering.  No  coloring  is  assumed.  For  the  new 
algorithm  we  use  p  =  0.7.  The  residual  history  is  presented  in  Fig.  5.  It  is  clear  that  if 
the  subdomain  ordering  does  not  follow  the  flow  characteristic  direction  the  convergence  of 
multiplicative  Schwarz  becomes  significantly  worse  than  in  a  case  when  the  ordering  follows 
the  flow.  Additive  Schwarz  is  not  sensitive  at  all  to  such  an  ordering,  but  is  quite  slow.  The 
new  algorithm  does  not  need  any  special  attention  to  the  ordering,  and  converges  faster  than 
(a)  the  additive  Schwarz  algorithm  in  all  four  cases;  (b)  the  worst  case  of  the  multiplicative 
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Additive  Schwarz  preconditioned  GMRES 


Fig.  5.  The  figures  show  the  iteration  history  of  the  additive  (upper  left),  multiplicative  (upper  right)  and 
the  new  (lower  left)  Schwarz  preconditioned  GMRES  method.  The  line  types  correspond  to  the  flow  directions, 
i.e.,  solid  lines  (3  =  (-1000,-1000),  dashed  lines  f3  =  (1000,1000),  dotted  lines  /?  =  (1000,-1000)  and  broken 
lines /3=(- 1000,1000). 

Schwarz  algorithm. 

Our  experience  suggests  that  it  is  by  no  means  easy  to  find  a  coarse  space  of  proper 
size  in  the  case  that  the  PDE  is  discretized  by  upwinding  finite  difference  methods.  Further 
theoretical  and  numerical  investigation  of  this  situation  is  underway. 

5.  Conclusion.  In  this  paper,  we  introduce  a  new  class  of  overlapping  domain  decom¬ 
position  methods  for  solving  scalar  convection-diffusion  problems.  The  method  improves 
the  classical  multiplicative  Schwarz  methods  by  reducing  their  sensitivity  with  respect  to 
the  flow  direction.  For  the  Galerkin  finite  element  discretization,  we  prove  that  the  method 
is  optimal  in  the  sense  that  the  convergence  rate  is  independent  of  the  mesh  size  and  also 
the  number  of  subdomains  in  both  R 2  and  R3.  Numerical  experiments  are  also  reported  to 
illustrate  the  rankings  of  the  methods  and  some  open  questions  are  identified. 

Acknowledgement.  The  authors  are  indebted  to  D.  Keyes  and  0.  Widlund  for  reading 
a  draft  of  the  paper  and  also  many  helpful  discussions. 
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